Voyellation automatique de l'arabe
نویسندگان
چکیده
We tackle the problem of automatic, or at least assisted, voc..aliT~tiorl, a problem that arises from the almost universal absence of vowels in Arabic texts. We show that the problem of vocalization resides in the fact that the majority of Arabic words accept several potential vocalizations and are therefore ambiguous. In essence, the problem reduces to choosing, in context, the correct vocalization from among several. We focus here on the results obtained by starting with morphological analysis and proceeding to a grammatical (part-of-speech) tagging. In the proposed system, the vocalic ambiguity is detected by means of a double dictiona~ ofvoweled and non-voweled forms. The process of resolution is set in motion starting with morphological analysis and continuing through subsequent steps. The experiments described here concern the treatment as far as grammatical (part-of-speech) tagging.
منابع مشابه
Pre-processing and Language Analysis for Arabic to French Statistical Machine Translation (Traduction automatique statistique pour l'arabe-français améliorée par le prétraitement et l'analyse de la langue) [in French]
متن کامل
De l'arabe standard vers l'arabe dialectal : projection de corpus et ressources linguistiques en vue du traitement automatique de l'oral dans les médias tunisiens
In this work, we focus on the problems of the automatic treatment of oral spoken in the Tunisian media. This oral is marked by the use of code-switching between the Modern Standard Arabic (MSA) and the Tunisian dialect (TD). Our goal is to build useful resources to learn language models that can be used in automatic speech recognition applications. As it is a variant of MSA, we describe in this...
متن کاملSmoothing methods for a morpho-statistical approach of automatic diacritization Arabic texts (Méthodes de lissage d'une approche morpho-statistique pour la voyellation automatique des textes arabes) [in French]
We present in this work a new approach for the Automatic diacritization for Arabic texts using three stages. During the first phase, we integrated a lexical database containing the most frequent words of Arabic with morphological analysis by Alkhalil Morpho Sys which provided possible diacritization for each word. The objective of the second module is to eliminate the ambiguity using a statisti...
متن کاملEtiquetage grammatical de l'arabe voyelle ou non
R6sum6 Nous abordons le probl~me de l'~tiquetage grammatical de l'arabe en reprermnt les m~thodes couramment utilis~es, lesquelles sont fond6zs sur des r~gles de succession de deux ou trois ~tiquettes grammaticales. Nous montrons que l'on ne peut pas reprendre tels quels les algorithmes pr~onis6s pour le francais ou pour l'anglais, la raison ~tant que l'arabe pose deux probl6mes : l'absence des...
متن کاملMultilingual Summarization Experiments on English, Arabic and French (Résumé Automatique Multilingue Expérimentations sur l'Anglais, l'Arabe et le Français) [in French]
The task of multilingual summarization aims to design free-from language systems. Extractive methods are in the core of multilingual summarization systems. In this paper, we discuss the influence of various basic NLP tasks: sentence splitting, tokenization, stop words removal and stemming on sentence scoring and summaries' coverage. Hence, we propose a statistical method which extracts most rel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998